261 research outputs found

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Full text link
    This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

    What the ‘Moonwalk’ Illusion Reveals about the Perception of Relative Depth from Motion

    Get PDF
    When one visual object moves behind another, the object farther from the viewer is progressively occluded and/or disoccluded by the nearer object. For nearly half a century, this dynamic occlusion cue has beenthought to be sufficient by itself for determining the relative depth of the two objects. This view is consistent with the self-evident geometric fact that the surface undergoing dynamic occlusion is always farther from the viewer than the occluding surface. Here we use a contextual manipulation ofa previously known motion illusion, which we refer to as the‘Moonwalk’ illusion, to demonstrate that the visual system cannot determine relative depth from dynamic occlusion alone. Indeed, in the Moonwalk illusion, human observers perceive a relative depth contrary to the dynamic occlusion cue. However, the perception of the expected relative depth is restored by contextual manipulations unrelated to dynamic occlusion. On the other hand, we show that an Ideal Observer can determine using dynamic occlusion alone in the same Moonwalk stimuli, indicating that the dynamic occlusion cue is, in principle, sufficient for determining relative depth. Our results indicate that in order to correctly perceive relative depth from dynamic occlusion, the human brain, unlike the Ideal Observer, needs additionalsegmentation information that delineate the occluder from the occluded object. Thus, neural mechanisms of object segmentation must, in addition to motion mechanisms that extract information about relative depth, play a crucial role in the perception of relative depth from motion

    Separable time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    We present an improved model and theory for time-causal and time-recursive spatio-temporal receptive fields, obtained by a combination of Gaussian receptive fields over the spatial domain and first-order integrators or equivalently truncated exponential filters coupled in cascade over the temporal domain. Compared to previous spatio-temporal scale-space formulations in terms of non-enhancement of local extrema or scale invariance, these receptive fields are based on different scale-space axiomatics over time by ensuring non-creation of new local extrema or zero-crossings with increasing temporal scale. Specifically, extensions are presented about parameterizing the intermediate temporal scale levels, analysing the resulting temporal dynamics and transferring the theory to a discrete implementation in terms of recursive filters over time.Comment: 12 pages, 2 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:1404.203

    Seeing Tree Structure from Vibration

    Full text link
    Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only. There are particular scenarios, however, where neither appearance nor spatial-temporal motion signals are informative: occluding twigs may look connected and have almost identical movements, though they belong to different, possibly disconnected branches. We propose to tackle this problem through spectrum analysis of motion signals, because vibrations of disconnected branches, though visually similar, often have distinctive natural frequencies. We propose a novel formulation of tree structure based on a physics-based link model, and validate its effectiveness by theoretical analysis, numerical simulation, and empirical experiments. With this formulation, we use nonparametric Bayesian inference to reconstruct tree structure from both spectral vibration signals and appearance cues. Our model performs well in recognizing hierarchical tree structure from real-world videos of trees and vessels.Comment: ECCV 2018. The first two authors contributed equally to this work. Project page: http://tree.csail.mit.edu

    Spatial Stereoresolution for Depth Corrugations May Be Set in Primary Visual Cortex

    Get PDF
    Stereo “3D” depth perception requires the visual system to extract binocular disparities between the two eyes' images. Several current models of this process, based on the known physiology of primary visual cortex (V1), do this by computing a piecewise-frontoparallel local cross-correlation between the left and right eye's images. The size of the “window” within which detectors examine the local cross-correlation corresponds to the receptive field size of V1 neurons. This basic model has successfully captured many aspects of human depth perception. In particular, it accounts for the low human stereoresolution for sinusoidal depth corrugations, suggesting that the limit on stereoresolution may be set in primary visual cortex. An important feature of the model, reflecting a key property of V1 neurons, is that the initial disparity encoding is performed by detectors tuned to locally uniform patches of disparity. Such detectors respond better to square-wave depth corrugations, since these are locally flat, than to sinusoidal corrugations which are slanted almost everywhere. Consequently, for any given window size, current models predict better performance for square-wave disparity corrugations than for sine-wave corrugations at high amplitudes. We have recently shown that this prediction is not borne out: humans perform no better with square-wave than with sine-wave corrugations, even at high amplitudes. The failure of this prediction raised the question of whether stereoresolution may actually be set at later stages of cortical processing, perhaps involving neurons tuned to disparity slant or curvature. Here we extend the local cross-correlation model to include existing physiological and psychophysical evidence indicating that larger disparities are detected by neurons with larger receptive fields (a size/disparity correlation). We show that this simple modification succeeds in reconciling the model with human results, confirming that stereoresolution for disparity gratings may indeed be limited by the size of receptive fields in primary visual cortex

    The Effects of Vitamin D Receptor Silencing on the Expression of LVSCC-A1C and LVSCC-A1D and the Release of NGF in Cortical Neurons

    Get PDF
    Recent studies have suggested that vitamin D can act on cells in the nervous system. Associations between polymorphisms in the vitamin D receptor (VDR), age-dependent cognitive decline, and insufficient serum 25 hydroxyvitamin D(3) levels in Alzheimer's patients and elderly people with cognitive decline have been reported. We have previously shown that amyloid β (Aβ) treatment eliminates VDR protein in cortical neurons. These results suggest a potential role for vitamin D and vitamin D-mediated mechanisms in Alzheimer's disease (AD) and neurodegeneration. Vitamin D has been shown to down-regulate the L-type voltage-sensitive calcium channels, LVSCC-A1C and LVSCC-A1D, and up-regulate nerve growth factor (NGF). However, expression of these proteins when VDR is repressed is unknown. The aim of this study is to investigate LVSCC-A1C, LVSCC-A1D expression levels and NGF release in VDR-silenced primary cortical neurons prepared from Sprague-Dawley rat embryos.qRT-PCR and western blots were performed to determine VDR, LVSCC-A1C and -A1D expression levels. NGF and cytotoxicity levels were determined by ELISA. Apoptosis was determined by TUNEL. Our findings illustrate that LVSCC-A1C mRNA and protein levels increased rapidly in cortical neurons when VDR is down-regulated, whereas, LVSCC-A1D mRNA and protein levels did not change and NGF release decreased in response to VDR down-regulation. Although vitamin D regulates LVSCC-A1C through VDR, it may not regulate LVSCC-A1D through VDR.Our results indicate that suppression of VDR disrupts LVSCC-A1C and NGF production. In addition, when VDR is suppressed, neurons could be vulnerable to aging and neurodegeneration, and when combined with Aβ toxicity, it is possible to explain some of the events that occur during neurodegeneration

    Robust Models for Optic Flow Coding in Natural Scenes Inspired by Insect Biology

    Get PDF
    The extraction of accurate self-motion information from the visual world is a difficult problem that has been solved very efficiently by biological organisms utilizing non-linear processing. Previous bio-inspired models for motion detection based on a correlation mechanism have been dogged by issues that arise from their sensitivity to undesired properties of the image, such as contrast, which vary widely between images. Here we present a model with multiple levels of non-linear dynamic adaptive components based directly on the known or suspected responses of neurons within the visual motion pathway of the fly brain. By testing the model under realistic high-dynamic range conditions we show that the addition of these elements makes the motion detection model robust across a large variety of images, velocities and accelerations. Furthermore the performance of the entire system is more than the incremental improvements offered by the individual components, indicating beneficial non-linear interactions between processing stages. The algorithms underlying the model can be implemented in either digital or analog hardware, including neuromorphic analog VLSI, but defy an analytical solution due to their dynamic non-linear operation. The successful application of this algorithm has applications in the development of miniature autonomous systems in defense and civilian roles, including robotics, miniature unmanned aerial vehicles and collision avoidance sensors

    Computational Models for Prediction of Yeast Strain Potential for Winemaking from Phenotypic Profiles

    Get PDF
    Saccharomyces cerevisiae strains from diverse natural habitats harbour a vast amount of phenotypic diversity, driven by interactions between yeast and the respective environment. In grape juice fermentations, strains are exposed to a wide array of biotic and abiotic stressors, which may lead to strain selection and generate naturally arising strain diversity. Certain phenotypes are of particular interest for the winemaking industry and could be identified by screening of large number of different strains. The objective of the present work was to use data mining approaches to identify those phenotypic tests that are most useful to predict a strain's potential for winemaking. We have constituted a S. cerevisiae collection comprising 172 strains of worldwide geographical origins or technological applications. Their phenotype was screened by considering 30 physiological traits that are important from an oenological point of view. Growth in the presence of potassium bisulphite, growth at 40 degrees C, and resistance to ethanol were mostly contributing to strain variability, as shown by the principal component analysis. In the hierarchical clustering of phenotypic profiles the strains isolated from the same wines and vineyards were scattered throughout all clusters, whereas commercial winemaking strains tended to co-cluster. Mann-Whitney test revealed significant associations between phenotypic results and strain's technological application or origin. Naive Bayesian classifier identified 3 of the 30 phenotypic tests of growth in iprodion (0.05 mg/mL), cycloheximide (0.1 mu g/mL) and potassium bisulphite (150 mg/mL) that provided most information for the assignment of a strain to the group of commercial strains. The probability of a strain to be assigned to this group was 27% using the entire phenotypic profile and increased to 95%, when only results from the three tests were considered. Results show the usefulness of computational approaches to simplify strain selection procedures.Ines Mendes and Ricardo Franco-Duarte are recipients of a fellowship from the Portuguese Science Foundation, FCT (SFRH/BD/74798/2010, SFRH/BD/48591/2008, respectively) and Joao Drumonde-Neves is recipient of a fellowship from the Azores government (M3.1.2/F/006/2008 (DRCT)). Financial support was obtained from FEDER funds through the program COMPETE and by national funds through FCT by the projects FCOMP-01-0124-008775 (PTDC/AGR-ALI/103392/2008) and PTDC/AGR-ALI/121062/2010. Lan Umek and Blaz Zupan acknowledge financial support from Slovene Research Agency (P2-0209). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Constant Angular Velocity Regulation for Visually Guided Terrain Following

    Get PDF
    Insects use visual cues to control their flight behaviours. By estimating the angular velocity of the visual stimuli and regulating it to a constant value, honeybees can perform a terrain following task which keeps the certain height above the undulated ground. For mimicking this behaviour in a bio-plausible computation structure, this paper presents a new angular velocity decoding model based on the honeybee's behavioural experiments. The model consists of three parts, the texture estimation layer for spatial information extraction, the motion detection layer for temporal information extraction and the decoding layer combining information from pervious layers to estimate the angular velocity. Compared to previous methods on this field, the proposed model produces responses largely independent of the spatial frequency and contrast in grating experiments. The angular velocity based control scheme is proposed to implement the model into a bee simulated by the game engine Unity. The perfect terrain following above patterned ground and successfully flying over irregular textured terrain show its potential for micro unmanned aerial vehicles' terrain following
    corecore